What are the contributions of this study ?
Write a brief summary (no more than 1.5 pages) that discusses the contributions of this study. You might wish to consider the following questions: What are the objectives, and how do they relate to the established literature? What is novel about the empirical strategy? What are the results and how convincing are these?
Answer 1: This paper studies how changes that make unemployment systems more generous affect the duration of unemployment. A considerable theoretical literature has shown that a more generous unemployment insurance system (through a higher RR and/or a longer PBD) will reduce the optimal job search effort of an unemployed worker and hence result in longer unemployment duration. That is also confirmed by the predictions of job search theory, which states, among other things, that as the duration of unemployment increases, individuals may lower their reservation wage due to financial pressures and the desire to avoid long-term unemployment. If these pressures are relieved by increasing the RR and/or the PBD, it is understandable that the duration of unemployment may increase. About the job search theory there have been many studies in the last decade.
In this study, as I anticipated, Lalive, van Ours, and Zweimüller investigate how changes in financial incentives, specifically the benefit replacement rate (RR) and potential benefit duration (PBD) of unemployment insurance, impact the duration of unemployment. The objectives of the study are to understand the effects of these key parameters on unemployment duration and to provide insights for policymakers on designing effective unemployment insurance policies.
The study contributes to the existing literature by focusing on the interplay between RR and PBD, which is a relatively understudied area. While previous research has examined the individual effects of RR or PBD changes on unemployment behavior, this study uniquely explores the combined impact of these parameters. By analyzing data from Austria, the authors are able to leverage a natural experiment that occurred in the late 1980s, providing a valuable opportunity to assess the effects of simultaneous changes in RR and PBD.
One of the key novelties of the empirical strategy is the use of longitudinal individual data from Austrian social security databases, allowing for a detailed analysis of unemployment spells and transitions. The study carefully constructs groups of job seekers based on income, age, and work experience criteria to isolate the effects of RR and PBD changes. This meticulous approach enhances the credibility of the results and strengthens the internal validity of the study.
The results of the study reveal several important findings. The authors find that increases in PBD have a more substantial impact on unemployment duration compared to changes in RR. Older workers exhibit stronger reactions to PBD extensions, highlighting the importance of considering age-related differences in policy design. The study also demonstrates that the combined effect of simultaneous changes in RR and PBD differs from the sum of individual effects, suggesting potential interaction effects between these parameters.
Overall, the results of the study are robust and provide valuable insights for policymakers. The findings underscore the significance of PBD in influencing unemployment behavior and suggest that adjusting the potential duration of benefits may be a more effective policy tool than altering benefit levels. By shedding light on the relative importance of RR and PBD in shaping unemployment outcomes, this study offers practical implications for designing efficient unemployment insurance systems.
We begin with some preliminary steps, loading the libraries and the data
# Libraries loading
library(foreign)
library(survival)
library(tidyverse)
library(dplyr)
library(survminer)
library(gt)
Then we check for the dimension of our dataset and for the presence of NA’s.
# Data loading
udat <- read.dta("fi.dta")
udat <- udat[,1:134] ## get rid of some superfluous variables
udat <- as_tibble(udat)
dim(udat)
## [1] 225821 134
sum(is.na(udat))
## [1] 0
# Data Visualization
glimpse(udat[,1:36])
## Rows: 225,821
## Columns: 36
## $ beginn <dbl> 10412, 10169, 10624, 10343, 10654, 10101, 10652, 10338, 10705…
## $ ein_zus <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0…
## $ sfrau <dbl> 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0…
## $ age <dbl> 51.13758, 54.71595, 54.04791, 54.11636, 46.13005, 42.11909, 5…
## $ after <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ dur <dbl> 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428571, 0.1428…
## $ nwage_pj <dbl> 11282.729, 8577.623, 6828.051, 10071.884, 6342.333, 6094.196,…
## $ e3_5 <dbl> 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ bdur <dbl> 30, 30, 30, 20, 30, 30, 30, 20, 30, 30, 30, 30, 30, 30, 30, 3…
## $ y1988 <dbl> 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1…
## $ y1989 <dbl> 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0…
## $ y1990 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ y1991 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ med_educ <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ hi_educ <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ lehre <dbl> 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0…
## $ married <dbl> 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1…
## $ single <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ divorced <dbl> 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
## $ f_marr <dbl> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0…
## $ f_single <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ f_divor <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ bc <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1…
## $ pnon_10 <dbl> 0.2651971579, 0.1003559679, 0.1081599146, 0.3736308813, 0.033…
## $ q2 <dbl> 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1…
## $ q3 <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0…
## $ q4 <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0…
## $ seasonal <dbl> 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
## $ manuf <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0…
## $ ten72 <dbl> -3.47749186, -1.04384339, 0.23723496, -2.63423371, -2.5197806…
## $ type <fct> PBD and RR, PBD and RR, PBD and RR, PBD and RR, PBD and RR, P…
## $ uncc <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ lwage <dbl> 6.173206, 5.918063, 5.645761, 6.059680, 5.571968, 5.576242, 5…
## $ tr <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ t39 <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0…
## $ t52 <dbl> 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1…
#head(data)
#summary(data)
This dataset contains many attributes, some of the most important are:
Here you can find a more in-depth list of the variables used and what they represent:
table(udat$type)
##
## PBD and RR PBD RR control
## 21174 99404 32470 72773
summary(udat$dur)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0712 6.8387 11.6826 18.5494 19.9456 644.0881
## Computation of average spells when durations are truncated at 104 weeks
udat %>%
mutate(dur104 = dur,
dur104 = ifelse(dur104 > 104, 104, dur104)) ->
udat
(difference-in-differences) Attempt to replicate Table 4 of the paper:
Answer 2:
basic = udat %>%
group_by(type, after) %>%
summarize(mean = round(mean(dur104),2),
n = n(),
se = round((sd(dur104) / sqrt(n)),2))
basic
## # A tibble: 8 × 5
## # Groups: type [4]
## type after mean n se
## <fct> <dbl> <dbl> <int> <dbl>
## 1 PBD and RR 0 18.5 11992 0.16
## 2 PBD and RR 1 22.7 9182 0.23
## 3 PBD 0 15.8 48294 0.08
## 4 PBD 1 18.1 51110 0.09
## 5 RR 0 17.1 17160 0.12
## 6 RR 1 19.1 15310 0.15
## 7 control 0 14.5 33815 0.08
## 8 control 1 15.6 38958 0.09
diff_in_diff <- udat %>%
group_by(type, after) %>%
summarize(mean_outcome = mean(dur104),
n_obs = n(),
se_outcome = sd(dur104) / sqrt(n_obs)) %>%
pivot_wider(names_from = after, values_from = c(mean_outcome, se_outcome, n_obs)) %>%
mutate(change_mean = `mean_outcome_1` - `mean_outcome_0`, change_se = `se_outcome_1` - `se_outcome_0`)
diff_in_diff = diff_in_diff %>%
mutate(DiD_mean = change_mean - as.double(diff_in_diff[4,8]))
#print(diff_in_diff)
diff_in_diff1 <- data.frame(
Group = levels(udat$type),
before_mean = diff_in_diff$mean_outcome_0,
before_se = diff_in_diff$se_outcome_0,
before_n = diff_in_diff$n_obs_0,
after_mean = diff_in_diff$mean_outcome_1,
after_se = diff_in_diff$se_outcome_1,
after_n = diff_in_diff$n_obs_1,
change_mean= diff_in_diff$change_mean,
change_se= diff_in_diff$change_se,
DiD_mean= diff_in_diff$DiD_mean
)
knitr::kable(diff_in_diff1, align = "c", digits = 2, caption = "My Table for DiD")
| Group | before_mean | before_se | before_n | after_mean | after_se | after_n | change_mean | change_se | DiD_mean |
|---|---|---|---|---|---|---|---|---|---|
| PBD and RR | 18.49 | 0.16 | 11992 | 22.74 | 0.23 | 9182 | 4.25 | 0.07 | 3.08 |
| PBD | 15.83 | 0.08 | 48294 | 18.08 | 0.09 | 51110 | 2.25 | 0.02 | 1.08 |
| RR | 17.11 | 0.12 | 17160 | 19.10 | 0.15 | 15310 | 1.99 | 0.03 | 0.82 |
| control | 14.46 | 0.08 | 33815 | 15.63 | 0.09 | 38958 | 1.17 | 0.01 | 0.00 |
levels(udat$type)
## [1] "PBD and RR" "PBD" "RR" "control"
udat$type <- relevel(udat$type, ref = "control")
levels(udat$type)
## [1] "control" "PBD and RR" "PBD" "RR"
lm1=lm(dur104~type+after, data=udat)
summary(lm1)
##
## Call:
## lm(formula = dur104 ~ type + after, data = udat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.421 -9.953 -4.967 2.869 90.009
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.99138 0.07722 181.19 <2e-16 ***
## typePBD and RR 5.45200 0.13913 39.19 <2e-16 ***
## typePBD 1.94545 0.08681 22.41 <2e-16 ***
## typeRR 3.08791 0.11883 25.98 <2e-16 ***
## after 2.04901 0.07503 27.31 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.79 on 225816 degrees of freedom
## Multiple R-squared: 0.01062, Adjusted R-squared: 0.0106
## F-statistic: 605.7 on 4 and 225816 DF, p-value: < 2.2e-16
coef(lm1)[3:4] + coef(lm1)[1]
## typePBD typeRR
## 15.93683 17.07929
coef(lm1)[3:4] + coef(lm1)[1] + coef(lm1)["after"]
## typePBD typeRR
## 17.98584 19.12830
A small note regarding this point: in the solutions proposed by the teacher the same result comes out as this model, which I imagine is the model used. However, perhaps it would be more accurate to add an interaction term between type and after: in this case the model would look like this:
lm2=lm(dur104~type*after, data=udat)
summary(lm2)
##
## Call:
## lm(formula = dur104 ~ type * after, data = udat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.669 -9.933 -4.864 2.860 89.538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.46226 0.09673 149.518 < 2e-16 ***
## typePBD and RR 4.02607 0.18904 21.297 < 2e-16 ***
## typePBD 1.37176 0.12612 10.876 < 2e-16 ***
## typeRR 2.64590 0.16671 15.871 < 2e-16 ***
## after 1.16942 0.13220 8.846 < 2e-16 ***
## typePBD and RR:after 3.08199 0.27985 11.013 < 2e-16 ***
## typePBD:after 1.07954 0.17383 6.210 5.3e-10 ***
## typeRR:after 0.81838 0.23786 3.441 0.00058 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.79 on 225813 degrees of freedom
## Multiple R-squared: 0.01117, Adjusted R-squared: 0.01114
## F-statistic: 364.5 on 7 and 225813 DF, p-value: < 2.2e-16
Another way to obtain the coefficient would be to suppress the constant of the linear model:
lm2=lm(dur104~type+after- 1, data=udat)
summary(lm2)
##
## Call:
## lm(formula = dur104 ~ type + after - 1, data = udat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.421 -9.953 -4.967 2.869 90.009
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## typecontrol 13.99138 0.07722 181.19 <2e-16 ***
## typePBD and RR 19.44338 0.12652 153.67 <2e-16 ***
## typePBD 15.93683 0.06836 233.14 <2e-16 ***
## typeRR 17.07929 0.10488 162.84 <2e-16 ***
## after 2.04901 0.07503 27.31 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.79 on 225816 degrees of freedom
## Multiple R-squared: 0.4756, Adjusted R-squared: 0.4756
## F-statistic: 4.096e+04 on 5 and 225816 DF, p-value: < 2.2e-16
coef(lm2)[3:4]
## typePBD typeRR
## 15.93683 17.07929
coef(lm2)[3:4] + coef(lm2)["after"]
## typePBD typeRR
## 17.98584 19.12830
(Survival Functions) Seek to reproduce Figure 3 in Lalive et al. (2006).
Answer 3:
line_type=c("dashed", "dashed", "dashed", "dashed","solid", "solid","solid","solid")
kmcurve <- survfit(Surv(dur104, uncc) ~ after, data = udat)
ggsurvplot(kmcurve,
ggtheme = theme_minimal(),
linetype = line_type,
censor.size = 0.1,
size = 0.5,
palette = c("#007BB9", "#50677D"),
legend.labs = c("Before 1989", "After 1989"),
facet.by = "type") +
guides(linetype = 'none') +
labs(
x = "Unemployment duration (weeks)",
y = "survival probability",
subtitle = "Kaplan-Meier survivor functions"
) +
theme_minimal() +
theme(
legend.position = "top",
plot.subtitle = element_text(
size = 16L,
hjust = 0.5
),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank()
) +
scale_color_manual(values = c("#007BB9", "#50677D"), labels = c("Before 1989", "After 1989")) +
facet_wrap(vars(type))
(Survival Functions) Seek to reproduce Figure 4 in Lalive et al. (2006).
Answer 4:
library(muhaz)
library(biostat3)
library(bshazard)
hazards <- udat %>%
group_by(type, after) %>%
do(as.data.frame.muhaz(muhaz(.$dur104, .$uncc, min.time = 0, max.time = 104, bw.method = "global", b.cor = "none"))) %>%
ungroup()
plot1=ggplot(hazards) +
aes(x = x, y = y, group = factor(after)) +
geom_line(aes(linetype = factor(after), color = factor(after)), linewidth = 0.8) +
scale_linetype_manual(values = c("dashed", "solid")) +
scale_color_manual(values = c( "#007BB9", "#50677D"),labels = c("Before 1989", "After 1989")) +
labs(
x = "Unemployment duration (weeks)",
y = "Hazard",
subtitle = "Kaplan–Meier unemployment exit rates",
color = element_blank(),
linetype = element_blank()
) +
theme_minimal() +
theme(
legend.position = "top",
plot.subtitle = element_text(
size = 16L,
hjust = 0.5
),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank()
) +
facet_wrap(vars(type)) +
xlim(0, 100) +
ylim(0, 0.15)+
guides(linetype = FALSE)
plot(plot1)
As we can see, the function presents some jumps and, although it exhibits similar results, it does not fully reflect Figure 4 of the paper, which means that we can still make a small correction: in fact, when we truncate the duration variable dur, creating dur104, the uncensored variable uncc should also be updated, as it must take into account the truncation we are introducing into the data. If we make this correction we obtain:
table_before=as.data.frame(table(udat$uncc))
table_before %>%
tibble::as_tibble() %>%
gt::gt() %>%
gt::tab_header(
title = "Variable uncc BEFORE updating")
| Variable uncc BEFORE updating | |
| Var1 | Freq |
|---|---|
| 0 | 9478 |
| 1 | 216343 |
udat$uncc=(ifelse(udat$dur>104, 0, udat$uncc))
table_after=as.data.frame(table(udat$uncc))
table_after %>%
tibble::as_tibble() %>%
gt::gt() %>%
gt::tab_header(
title = "Variable uncc AFTER updating")
| Variable uncc AFTER updating | |
| Var1 | Freq |
|---|---|
| 0 | 11866 |
| 1 | 213955 |
Let’s redo question 4 with this corrected version of the dataset: In this case we obtain a result that is more similar to the one in the paper, because now the variable uncc is updated taking into consideration the truncation made on dur (we are using, in fact, dur104). We can notice that we don’t have anymore the ‘jumps’, but the function is now continuous and well defined across the entire domain.
hazards <- udat %>%
group_by(type, after) %>%
do(as.data.frame.muhaz(muhaz(.$dur104, .$uncc, min.time = 0,max.time=104, bw.method = "global", b.cor = "none"))) %>%
ungroup()
plot2=ggplot(hazards) +
aes(x = x, y = y, group = factor(after)) +
geom_line(aes(linetype = factor(after), color = factor(after)), linewidth = 0.8) +
scale_linetype_manual(values = c("dashed", "solid")) +
scale_color_manual(values = c( "#007BB9", "#50677D"),labels = c("Before 1989", "After 1989")) +
labs(
x = "Unemployment duration (weeks)",
y = "Hazard",
#title = "Figure 4",
subtitle = "Kaplan–Meier unemployment exit rates",
color = element_blank(),
linetype = element_blank()
) +
theme_minimal() +
theme(
legend.position = "top",
plot.subtitle = element_text(
size = 16L,
hjust = 0.5
),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank()
) +
facet_wrap(vars(type)) +
xlim(0, 100) +
ylim(0, 0.15)+
guides(linetype = FALSE)
plot(plot2)
To better visualize the difference between these plots, I compare them with each other:
Estimate the causal treatment effect in a PH model.
Answer 5:
udat %>%
mutate(all = tr * (t39 + t52) ) ->
udat
breaks <- seq(from=3,to=59, by=4)
labels <- paste("(", c(0,breaks), ",", c(breaks,104), "]",sep="")
gux <- survSplit(Surv(dur104,uncc) ~., data=udat, cut = breaks,
end = "time", event="death", start="start", episode="interval")
gux %>%
mutate(exposure = time - start,
interval=factor(interval+1, labels = labels) ) ->
gux
pwe <- coxph(Surv(exposure, death) ~ interval + age + factor(single) + factor(married) + factor(divorced) + factor(sfrau) + factor(f_marr) + factor(f_single) + factor(f_divor) + factor(lehre) + factor(med_educ) + factor(hi_educ) + nwage_pj + factor(bc) + lwage + ten72 + pnon_10 + factor(seasonal) + factor(manuf) + factor(y1988) + factor(y1989) + factor(y1990) + factor(y1991) + factor(q2) + factor(q3) + factor(q4) + interval:tr + interval:t39 + interval:t52 + interval:all + interval:after0 + interval:tr_a0 + interval:t39_a0 + interval:t52_a0 + interval:t39tra0 + interval:t52tra0, data = gux)
library(stargazer)
stargazer(pwe,
dep.var.caption="",dep.var.labels="",
keep=1:15,
omit.table.layout = "n", star.cutoffs = NA,
keep.stat=c("n", "ll"),no.space=TRUE,
header=FALSE,
title="The PWE model", type="text"
)
##
## The PWE model
## ===============================
##
## -------------------------------
## interval(3,7] 0.631
## (0.022)
## interval(7,11] 1.093
## (0.022)
## interval(11,15] 1.337
## (0.023)
## interval(15,19] 1.380
## (0.025)
## interval(19,23] 1.508
## (0.027)
## interval(23,27] 1.203
## (0.034)
## interval(27,31] 1.330
## (0.037)
## interval(31,35] 1.196
## (0.045)
## interval(35,39] 0.818
## (0.059)
## interval(39,43] 0.684
## (0.069)
## interval(43,47] 0.573
## (0.079)
## interval(47,51] 0.441
## (0.091)
## interval(51,55] 0.313
## (0.103)
## interval(55,59] 0.223
## (0.114)
## interval(59,104] 0.201
## (0.064)
## -------------------------------
## Observations 1,057,906
## Log Likelihood -2,879,250.000
## ===============================
Comparing the coefficients of all the variables (just by commenting ‘keep=1:15’), we can see that they are very similar to those obtained in the paper.
Having reached this point we conclude the work with some final considerations on the work carried out. Replicating an academic paper that features data usage and does not report all the formulas and code used to obtain certain results can be very complicated and treacherous. On the one hand, in fact, there is the problem of understanding in detail what the authors are trying to do, since every slightest difference in thought and therefore in implementation can generate large differences in the final output; on the other hand, you run the risk of making imperfections related to the more practical part of writing the code, and of using different algorithms and models, once again compromising the final result.